Telephone for the deaf and method of using same

ABSTRACT

An electronic communications system for the deaf includes a video apparatus for observing and digitizing the facial, body and hand and finger signing motions of a deaf person, an electronic translator for translating the digitized signing motions into words and phrases, and an electronic output for the words and phrases. The video apparatus desirably includes both a video camera and a video display which will display signing motions provided by translating spoken words of a hearing person into digitized images. The system may function as a translator by outputting the translated words and phrases as synthetic speech at the deaf person&#39;s location for another person at that location, and that person&#39;s speech may be picked up, translated, and displayed as signing motions on a display in the video apparatus.

CROSS REFERENCE TO RELATED APPLICATION

The present application is a continuation-in-part of our applicationSer. No. 08/396,554 filed Mar. 1, 1995, now abandoned now U.S. Pat. No.5,592,801.

BACKGROUND OF THE INVENTION

The present invention relates to electronic apparatus for communicationby the deaf, and, more particularly, to such apparatus which enables thedeaf person to communicate through use of sign language.

Deaf people are employed in almost every occupational field. They drivecars, get married, buy homes, and have children, much like everyoneelse. Because of many inherent communication difficulties, most deafpeople are more comfortable when associating with other deaf people.They tend to marry deaf people whom they have met at schools for thedeaf or through deaf clubs. Most deaf couples have hearing children wholearn sign language early in life to communicate with their parents.Many deaf people tend to have special electronics and telecommunicationsequipment in their homes. Captioning decoders may be on theirtelevision, and electrical hook-ups may flash lights to indicate whenthe baby is crying, the doorbell is ringing, or the alarm clock is goingoff.

However, deaf persons have substantial difficulties in communicatingwith persons at remote locations. One technique which is employedutilizes a teletype machine for use by the deaf person to transmit hismessage and also to receive messages, and the person with whom the deafperson is communicating also has such teletype machine so that there isan effective connection directly between them. In another method, thedeaf person utilizes a teletype machine, but the person who iscommunicating with the deaf person is in contact with a communicationscenter where a person reads the transmission to the hearing person overthe telephone and receives the telephone message from the hearing personand transmits that information on the teletype machine to the deafperson. Obviously, this teletype based system is limited and requiresthe deaf person to be able to manipulate a teletype machine and tounderstand effectively the written information which he or she receiveson the teletype machine. Processing rapidly received written informationis not always effective with those who have been profoundly deaf forextended periods of time. Moreover, a system based upon such teletypetransmissions is generally relatively slow.

The widespread availability of personal computers and modems, hasenabled direct communication with and between deaf persons having suchcomputers. However, it is still required that the deaf person be able totype effectively and to readily comprehend the written message beingreceived.

Deaf persons generally are well schooled in the use of finger and handsigning to express themselves, and this signing may be coupled withfacial expression and/or body motion to modify the words and phraseswhich are being signed by the hands and to convey emotion. As usedherein, “signing motions” include finger and hand motions, body motions,and facial motions and expressions to convey emotions or to modifyexpressions generated by finger and hand motions. A written messagebeing received on a teletype machine or computer may not convey anyemotional content that may have been present in the voice of the personconveying the message.

Profoundly deaf people communicate among themselves by this signlanguage on a face to face basis, and utilize a Tele-Typewriter (TTY)for telephone communication. The TTY itself leaves much to be desired,since their sign language is a modified syntax of the spoken language,resulting in a smaller vocabulary and lessened ease of reading printedtext as a whole (e.g. definite and indefinite articles [“the”, “a”,“an”] are omitted most of the time and possessives and plurals are notusually distinguished.

When it comes to communication of profoundly deaf persons and normallyhearing persons, the problem intensifies. Only a negligible percentageof the non-deaf population is versed in sign language. Thus, some deafpeople read lips and utter words similar enough in their vocalresemblance to enable them to be understood. Beyond this tedious andtaxing effort, there is virtually no form for such communication exceptexchanging some written notes or having an interpreter involved.

A number of methods as to how to achieve sign recognition have beenproposed in the literature. However, in spite of the apparent detail ofsuch articles, they do not go beyond general suggestions, which failwhen tested against the development of enabling technology. Majorproblems have been impeding the success of such enabling technology.

The Kurokawa et al article entitled “Bi-Directional Transmission BetweenSign Language And Japanese For Communication With Deaf-Mute People”Proceedings of the 5th International Conference on Human ComputerInteraction, 2, 1109 (1993) described how limited recognition can beachieved of static gestures utilizing electromechanical gloves which aresensor based and Kurokawa digitizes the electromechanical output ofsensors. Capturing images with a camera is a well known art, butinterpreting such images in a consistent way without relying on thehuman brain for direct interpretation (i.e., machine interpreted images)has alluded researches. The Rogers article entitled “ProceedingsSPIE-The International Society For Optical Engineering: Applications ofArtificial Neural Networks”, IV, 589 (1993), suggests various approacheswhich cannot work when tested in a real life situation, such asutilizing infrared for signal interpretation. Unfortunately, one cannotcombine the technology of Rogers and Kurokawa to solve the problembecause the technologies employed are mutually exclusive. If one usesimages as Rogers proposes, one cannot obtain from them the informationprovided by the sensors of the data gloves of Kurokawa; if one usesKurokawa's gloves, one cannot utilize the camera images to provide anyintelligence, knowledge or information beyond what the sensors in theDataGloves provide. Therefore, a fresh approach to the problem isnecessary.

Displaying signed motions presents another challenge. A simple databaseof all possible signed motions which is an intuitive approach is ratherproblematic. To create a lucid signing stream, one needs a smoothmovement from one word or phrase to another. Otherwise, the signing isjerky at best if not totally unintelligible. Although there may havebeen suggestions for such a database of signing images, this is not arealistic resolution due to the fact that, for every signed image in thedatabase, one will need to have an enormous amount of connectingmovements to other potential gestures, increasing dramatically the sizeof the database. To select a signing stream, inclusive of all the properintermediary connecting gestures between previous and current imagesneeded for lucid signing presentation, from such an enormous databaseputs search algorithms to an unrealistic challenge.

Attempts have also been made to transmit digitized signing motions to acentral station as disclosed in Jean-Francois Abramatic et al, U.S. Pat.No. 4,546,383. Even when images are transmitted as proposed by Abramaticet al, the edge detection performed fails to enunciate detail ofoverlapping hands, or to differentiate between finger spelling andsigned motions. All such attempts are restricted by available bandwidthwhich curtails wide use of such methods.

It is an object of the present invention to provide a novel electroniccommunication system for use by deaf persons to enable them tocommunicate by signing.

It is also an object to provide such an electronic communication systemwherein the deaf person and the person communicating with the deafperson do so through a central facility containing a translating meansfor processing elements of digitized image data.

Another object is to provide such a system in which a hearing person mayhave his speech converted into digitized signing motions which aredisplayed to the deaf person.

A further object is to provide a unique method utilizing such anelectronic communication system to enable communication by and to deafpersons.

SUMMARY OF THE INVENTION

It has now been found that the foregoing and related objects may bereadily attained in an electronic communications system for the deafcomprising a video apparatus for observing and digitizing the signingmotions, and means for translating the digitized motions into words andphrases. Also included are means for outputting the words and phrases ina comprehensible form to another hearing person, generally as artificialspeech.

In a telephone type system, the other person is at a remote location,although the system may also be used as a translator for communicationwith a person in the immediate vicinity. Generally, the video apparatusis a video camera.

From cost and portability standpoints, the translating means is at aremote location or central station and there is included transmissionmeans for transmitting the digitized signing motions or their digitalidentifiers to the translating means.

In addition to use of a database of words and phrases corresponding todigitized motions, the translating means also includes artificialintelligence for interpreting and converting the translated motions intowords and phrases and into coherent sentences.

The outputting means may convert the coherent sentences into syntheticspeech or present the words and phrases in written form.

To enable communication of the deaf person, the system includes meansfor the other or hearing person to transmit words and phrases. Thetranslating means is effective to translate said words and phrases intodigitized signing motions, and the video apparatus includes a displayscreen which provides an output of the digitized signing motion on thedisplay screen for viewing by the deaf person.

There is included means for translating speech into digital datarepresenting words and phrases and such digital data into digitizedsigning motions. Desirably, the video apparatus includes a displayscreen to provide an output of the digitized motions as signing motionson the display screen for viewing by the deaf person. The videoapparatus also includes a microphone and speaker whereby a deaf personmay communicate with another person in the immediate vicinity.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic presentation of the steps performed in anelectronic communication system embodying the present invention;

FIG. 2 is a schematic representation of a method for connecting anincoming call on the deaf person's telephone to a processing centerproviding the computer software for the translating functions of thepresent invention;

FIG. 3 is a schematic representation of the functions when utilizingsuch a processing center;

FIG. 4 is a schematic presentation of the several steps in theintervention and operation of the processing center when a call isreceived by the deaf person's telephone;

FIGS. 5a-5c are perspective views of a deaf person'sreceiver/transmitter installation embodying the present invention inthree different forms using a personal computer and video camera, usinga television set with a video camera, and as a public telephone kiosk;

FIG. 6 is a perspective view of the present invention in the form of acellular telephone;

FIG. 7 is a schematic representation of artificial intelligence used todetermine and translate the emotional content in the speech of a hearingperson communicating with a deaf person;

FIG. 8 is a diagrammatic representation of the manner in which thescreen of a display unit may be divided into sections presentingelements of information in addition to signing motions;

FIG. 9 is a schematic representation of the modules of the artificialintelligence for converting signing into speech;

FIG. 10 is a schematic representation of the modules for creatingmultiple neural networks and collecting the necessary examples fortraining these networks;

FIG. 11 is a schematic representation of the modules for controlling theconversion of text to signing animation;

FIG. 12 is a schematic representation of the modules for capturing andcompressing the images to be used during the playback of sign languageanimation;

FIGS. 13 illustrates a user of the device wearing special gloves toenhance the ability of the system to identify the signing of the deafperson;

FIGS. 14a-14d illustrate the manner in which the unique shape of theglove makes it possible to recognize the differences between two verysimilar signs;

FIG. 15 is a schematic representation of the steps to effect translationof English text to American Sign Language (ASL); and

FIG. 16 is a schematic representation of the steps to effect translationof American Sign Language to English text.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Turning first to FIG. 1 of the attached drawings, therein illustratedschematically is an electronic communications system embodying thepresent invention.

Generally, the deaf person uses sign language in front of a devicecontaining a video camera. The images captured by the camera at 20-30frames/second are processed by a digital device which does initial andextended image processing. In the processing, each of the framescontaining a captured image undergoes a process whereby the image istransformed into manageable identifiers. It is the set of identifiers,in the form of tables of numbers, that travels the normal telephonelines to the central processing facility (i.e., the Center). Theseidentifiers, and not the images themselves, are then correlated with adatabase of vocabulary and grammar by using artificial intelligence atthe Center. Subsequently, syntax rebuilding occurs, again utilizingartificial intelligence, resulting in a complete verbal text which isequivalent to the signed language content. The text then undergoes atext-to-synthesized-speech transformation and the speech is sent as ananalog signal to any ordinary telephone utilized by a hearing person byexisting copper or fiberoptic telephone lines. Part of the artificialintelligence referred to above consists of neural networks which aretrained for these specific applications.

On the other end of the telephone line, the normally hearing persontalks on his or her conventional telephone in the normal and regular wayof spoken language. His or her voice is carried on line (in whatevermethod of transport is utilized by the telephone carrier) to the Centerwhere speech recognition algorithms convert the spoken word to text. TheCenter will accommodate appropriate speech recognition (i.e., automatic,continuous and speaker independent). The recognized speech is thentransformed into its equivalent signing content vocabulary and then intotext. The text is sent via the telephone lines to the device used by thedeaf person and converted to signing animation. Depending upon thetransmission line and computer capability of the deaf person's location,the text may be sent as reduced identifiers which are converted intoanimated images by the deaf person's computer or as completely formattedanimated images. The sign images then appear on the screen of a monitorviewed by the deaf person, resulting in a continuous dynamic set ofanimated sign language motions which portray the content of the spokenlanguage uttered as speech by the normally hearing person.

In view of the computer processing requirements, a preferred form of thepresent invention includes a processing center containing thesophisticated computer equipment, databases and neural networks toeffect the signing/verbal translations, and the communications areconducted through this center. As seen in FIG. 2, a caller (or receiver)and deaf person are actually communicating through such a center. Themethod of employment of the center is illustrated in FIG. 3 wherein thecenter receives the input from the video device of the deaf person andprovides an audible output to the hearing person who is using atelephone. The hearing person speaks into the telephone and the centerprovides a video output to the video device of the deaf person.

To avoid excessive costs for a hearing caller, the telephoneinstallation of the deaf person receiving a call may automatically callthe center and switch the incoming call to a routing through the centeras is illustrated in FIG. 4.

In FIG. 5a, the deaf person's station comprises a personal computer 30including the monitor 32 and a video camera 34. In FIG. 5b, a computerunit 36 and a video camera 38 is utilized on top of a standardtelevision set 40 so as to be at hand level. In FIG. 5c, a public kiosk42 has built into it, a video camera 44, a video monitor 46, and lamps48 to ensure adequate lighting of the user's hands, face and body. Toplace the call, there is a keypad 50, and a credit card reader may becombined therewith.

A portable transmitter/receiver generally designated by the numeral 8for use by a deaf person is shown in FIG. 6 and it contains a videocamera, the lens 10 of which is disposed in the upright portion 12. Inthe base portion 13 are an LCD display panel 14 and a key pad 16 fordialing and other functions. Also seen is an antenna 18 for the deviceso that it may be transported and communicate as a wireless remote orthrough a cellular telephone network. The device is supported in astable position and the deaf person is positioned so that the cameralens 10 will record the signing movement of the hands and fingers andbody and facial motions and expressions. The signing motions captured bythe camera are converted into digital data for processing by thetranslation software, (i.e., artificial intelligence) to produce datarepresenting numbers, words and phrases which are then combined intocoherent sentences. As previously indicated, such translation is mosteconomically effected in a dedicated central computer facility. Thetranslated message is then conveyed to the “listener” in either verbalor written form.

The other party may speak into a telephone receiver (not shown) and theverbal expressions are translated by the artificial intelligence intodigital data for signs. These signs are displayed on the LCD panel 14.

Since the emotional content of the speech of the other party is notconveyed by signs, the artificial intelligence in the system may providean analysis of the emotional content of the speech and convey this tothe LCD display panel as a separate output. Indicative of the functionsof the artificial intelligence software for doing so is the diagrammaticpresentation in FIG. 7.

This is portrayed to the deaf either as a separate image in a corner ofthe screen which he or she is watching or incorporated into facialexpressions of animated signing figures.

Turning next to FIG. 8, therein illustrated is a layout for the visualdisplay to present multiple information to the deaf person such astouchless function buttons, system status indicators, alarms, a printedtranslation, and a playback of the image being recorded, as well as thesigning images and text of the hearing person's responses.

FIGS. 9-12 are schematics of the system software modules for convertingsigning to speech and speech to animation, including system trainingmethods.

The overall operation of a preferred electronic communications system isset forth hereinafter.

The deaf person uses sign language in front of the transmitter/receiverdevice containing the camera. The images captured by the camera are ofthe finger and hand motions and of body motions and of facialexpressions and motions captured by a digital device which does initialprocessing. In the initial processing, each of the frames containing acaptured image undergoes a process whereby the image is collapsed into asmall set of fixed identifiers. At the end of the initial processing,the resulting information is sent as data on a regular and designatedphone line using an internal modem in the device to the data processingcenter.

The rest of the processing is completed at the center. This includesidentification of the letters, numbers and words, conversion to standardsign language, and the conversion to spoken language which results inthe equivalent text of the signed content. The text then undergoes atext to synthesized speech transformation and the speech is sent as ananalog content to the normally hearing person. The voice content mayleave the center as data if packet switching (64 kb or 56 Kb service) isutilized directly from the center. Processing in the center utilizesartificial intelligence such as neural networks trained for the specificapplications of the device.

The normally hearing person who calls a deaf person dials the deafperson's phone number. However, at the deaf person's station, his or hercall is connected to the center on a single line which is the deafperson's designated line to the center. The deaf person's devicearranges for switching and enables both the caller and his or herstation to be on line as a “party call”. The deaf person's station alsoarranges for the simultaneous transmission of both voice and data on thededicated line. Thus, the line between the normally hearing person andthe deaf person is analog for voice content only, while the line betweenthe deaf person (and now the normally hearing person too) is analog buttransfers both voice and data.

The normally hearing person's voice undergoes speech recognition in thecenter and is transformed into the equivalent signing content and theninto textual material. The text is sent from the center to the deafperson's device via telephone lines. Software in the device converts thetext into reduced identifying pointers for each gesture, which are thenconverted into animated images which portray in sign language thecontent of the speech processed in the center.

In a cellular phone, the operation is much the same in its operation asthe hard wired telephone. The camera in the cellular phone transmits theimage for initial processing in the cellular phone. From there thereduced data is transmitted to the center for processing. The sameswitching occurs here as well, and voice/data is sent to the center onthe dedicated line assigned for the deaf person. However, in this casethe cellular phone maintains two cellular connections on line, one tothe center (voice/data) and one to the caller. The deaf person sees thecontent of the call to him by viewing the display LCD on his cellularphone unit.

When the phone for the deaf is equipped with a microphone and a speakerinstead of, or in addition to a second telephone channel, it may beturned into a communicator. Obviously, one can opt to have both of theseoptions to double the usefulness of the device. The communicator enablesthe deaf person to conduct a “conversation” with any normally hearingperson in the close proximity. The signing motion of the deaf person areprocessed by the center and is transmitted back to the device as anormal voice transmission which the speaker renders as speech to thenormally hearing person. His or her speech in turn, is picked up by themicrophone and sent to the center for processing. The result is ananimated content on the LCD of the communicator which portrays in signlanguage the spoken content of the normally hearing person.

The modules for the software effect translation of the signing into andfrom digital text are set forth in FIGS. 9 and 10 and those to recognizeanimation are set forth in FIGS. 11 and 12. Software presently used forthis purpose is appended hereto and is utilized with Borland C++.

A person engaging in the development of other software should considerthe following with respect to figure tracking:

-   A. The groups listed below are captured in their separate forms,    then added to integrated forms. The integrated forms are then    integrated into a single observable signing (i.e. our normalized    signing with a single camera), while location information are kept    in a separate log. The separate log can have various usages which    may not be in their entirety related to signing on the phone. Such    can be the case of activating an ATM machine or food billboard in a    drive-in situation.-   a. Definitions:    -   L(h):=Left hand    -   L(a):=Left arm    -   R(h):=Right hand    -   R(a):=Right arm    -   L(H):=Left side of the head    -   R(H):=Right side of the head    -   L(T):=Left side of torso    -   R(T):=Right side of torso    -   L(T):=Left side of torso    -   R(f):=Right femur    -   L(f):=Left femur    -   R(t):=Right tibia    -   L(t):=Left tibia-   B. Section addition with recognition takes place:-   b.1.A=L(h)+L(a)    -   B=R(h)+R(a)    -   C=L(H)+R(H)    -   D=L(T)+R(T)    -   E=L(t)+L(f)    -   G=R(t)+R(f)-   c. Signing content (Sc):    -   S=A+B-   d. Emotional content (Ec):    -   Ec=C+D-   e. Pointing and activation (PA):    -   PA=A+B-   f. Location in space (Ls):    -   Ls=E+G+(C+D+A+B)

In seeking to have the software recognize emotional content in thesigning or in the speech, the following should be considered:

Our emotional content is divided into two separate segments:

-   A. The hearing person segment-   B. The hearing challenged segment    -   A. The hearing person segment.

In this segment we analyze in the speech four distinct elements.

-   A.1. Changes in various speech output elements.-   A.2. Duration of changes recognized in A.1.-   A.3. Frequency of the changes appearing in A.1.-   A.4. Frequency of the duration of changes appearing in A.2.

The elements that are analyzed by A.1., through A.4. are:

-   -   a. Pitch    -   b. Volume    -   c. Non words elements for which the system is trained (g.g.,        gasps of air, emitting the word “ah, chuckle, crying, etc.)    -   d. Repetitions of words and/or word parts (indicating        stuttering).

-   B. The hearing challenged person segment.

This segment analyzes combination of intrafacial positions, where thesystem utilizes the training similar to signing, but with differentattributes and meanings.

-   a. Definitions and variables status;    -   U(I):=Upper lip [showing=1, not showing=0]    -   LL(1):=Lower lip [showing=1, not showing=0] (m):=Left part of        mouth [compressed=1, uncompressed=0]    -   R(m):=Right part of mouth [compressed=1, uncompressed=0]    -   M( ):=Complete mouth as a unit [Opened wide=1, closed=0;        -   compressed and drawn in=4;        -   compressed and downward=5;        -   stretched flat=6;        -   opened with teeth showing=7]    -   U(t):=Upper front teeth [showing=1; not showing=0]    -   LL(t):=Lower front teeth [showing=1; not showing=0]    -   t():=Frontal teeth as a whole [shown=1; not shown=0]    -   R(n):=Right nostril [expanded=1; unexpanded=0]    -   L(n):=Left nostril [expanded=1; unexpanded=0]    -   L(cb):=Left cheek bone [raised=1; unraised=0]    -   R(cb):=Right cheek bone [raised=1; unraised=0]    -   LO(e):=Left Open eye as a whole [distance above pupil=1; no        distance above pupil=0]    -   RO(e):=Right Open eye as a whole [distance above pupil=1; no        distance above pupil=0]    -   LC(e):=Left closed eye    -   RC(e):=Right closed eye    -   LN(e):=Left eye narrowed    -   RN(e):=Right eye narrowed    -   R(b):=Right eye brow [raised=1; not raised=0]    -   L(b):=Left eye brow [raised=1; not raised=0]    -   N(b):=Nose bridge [two states: compressed=1; uncompressed=0]    -   F(f):=Frontal forehead [compressed=1; uncompressed=0]

In addition to the emotional content variable Ec, we analyze variouscombinations as they pertain to emotional expressions of a culturalgroup. For example:

-   -   The state of (i.e., showing of) to=1 and n(b)=1    -   means “anger”.

Computer software for speech recognition and conversion to digital datapresently exists and may be modified and enhanced for use in thecommunications system. Exemplary of such software is that ofInternational Business Machines designated “IBM Continuous SpeechRecognition Program”. Similarly, commercial software may be used toconvert digital data into artificial speech.

Because commercial speech recognition software is not completelyaccurate, it may be desirable to develop a corrective addon to increasethe accuracy as set forth hereinafter:

Algorithmic Steps

-   a. Duplicate each incoming analog stream to provide two segments:    -   1. An untouched segment (Segment A).    -   2. A processed segment (Segment B).-   b. Tag each segment with respect to position in the incoming stream.-   c. Each segment (Segment A) can have variable length.-   d. Digitize incoming analog stream.-   e. operate speech recognition kernel on Segment B.    -   e.1. Speech recognition kernel.    -   e.2. Spell checker for word.    -   e.3. Grammatic checks.    -   e.4. If recognized and proper tag as Ra    -   If unrecognized or improper tag as Ua-   f. Tag each fully (i.e., 100%) recognized word as to its position in    Segment B.-   g. Deduct the recognized words of Segment B in their appropriate    position in Segment B from Segment A. The result is Segment C.    -   g.1. Segment C is tagged to identify its position in Segment A        (Position 1).-   h. Segment C is inserted into a prepared digitized speech section    (which contains a message to the speech originator)-   i. Digital to Analog conversion takes place.-   j. The resulting analog speech segment is sent to the speech    originator.-   k. Return from speech originator is digitized (Segment D).-   l. Segment D is inserted in position 1 in Segment A.-   m. Segment A is declared 100% recognized segment and is moved to    signing dispatch.    Corrective Measures

Corrective measures fall into the following.

-   A. Topic Assisted/using Trap words-   B. Intermediary Agent Assisted-   C. Speaker Assisted.-   D. Spell Checker assistance.-   E. Grammatic Assistance.    -   A. Topic Assisted-   1. Invoking the most common nine words to decide:    -   1.a. Accent/Country/Location    -   1.b. Channel to subgroup section [divided into geographic and        demographic (cultural) groups-   2. Invoke Trap words to locate area of discussion-   3. Utilize B-tree [C++,V4+] for list of words possibly matching word    in question.    First Level of Assistance-   1. This level utilizes trap words in order to determine personal    speech patterns.-   2. Big Nine words are evaluated in 4 tiers: Word [i.j.k.l] i=1, . .    . ,n; n=n(a)+n(b) where n(a)=6, and n(b)=6.

Values of n(a) or n(b) can be modified per specific situation.

-   -   i determines the group most appropriate to determine any of the        nine words.    -   S=Total number of words        $S = {{\underset{i = 1}{\sum\limits^{9}}{\text{Word}\quad\lbrack i\rbrack}} = 9}$        Second Level of Assistance

-   1. This level traps words to determine area of discussion.    -   j=1, . . . ,10 i.e. Ten words for each area of concentration    -   k=1, . . . ,12 i.e. Twelve areas of concentration        ${S\left( {j,k} \right)} = {{\underset{j = 1}{\sum\limits^{10}}{\underset{K = 1}{\sum\limits^{12}}{\text{Word}\quad\left\lbrack {j,k} \right\rbrack}}} = 120}$        Third Level of Assistance

-   1. This level compares unrecognized words against groups of 20 words    describing each of the 12 areas.    ${S\left( {i,j,k,l} \right)} = {{\underset{i = 1}{\sum\limits^{9}}{\underset{j = 1}{\sum\limits^{10}}{\underset{K = 1}{\sum\limits^{12}}{\underset{L = 1}{\sum\limits^{20}}{\text{Word}\quad\left\lbrack {i,j,k,l} \right\rbrack}}}}} = {{9 \times {10 \cdot 12 \cdot 20}} = {\text{20.600}\quad\text{words}}}}$

If the signer uses American Sign Language, there is a need to effectlinguistic analysis beyond what was recognized by William Stokoe inSemantics and Human Sign Language, Mouton (1971), and Sign LanguageStructure, Linstok Press (1978).

ASL is a visual-spatial language requiring simultaneous, multiple,dynamic articulations. At any particular instant, one has to combineinformation about the handshape (Stokoe's dez), the motion (Stokoe'ssig) and the spatial location of the hands relative to the rest of thebody (Stokoe's tab). Supplementing such information and by dynamicallyarticulating a word or a meaning, are grammatical cues provided incontext and requiring attention to detail.

Repetition of words indicates plurality, vibrations signify intensity,and relative spatial distance between cooperating hands specifiesmagnitude. Further grammatical delineation is contributed by facialexpressions. Some of the facial cues are intuitive to human emotions andsimplify such correlation. For example, the eyebrows when raisedindicate surprise but when drawn down in a frown like manner signifynegation or suspicion. Other facial expressions have no such immediateand intuitive affect. Such as the case of utilizing tongue position. Aprotruding tongue synchronized with the sign “late” turns the meaninginto “not yet”.

Isolated grammatical similarities exist between the two languages,although their utilization in translation differs. Utilizing a numbersystem with its siblings of ordinal numbers, age, or time as well ascompounds are examples of such similarities.

Translation of compound words in a spoken language is benefited by itswritten presentation as a single unit, or when spoken, presentation in acontinuous utterance, guarantees a unique interpretation which begets acorrect translation. “Homework”, “businessman”, “classroom”,“babysitter” are all in daily usage as independent words.

Compounds in ASL are no different than their spoken counterparts, albeitthe fact that no manual dexterity is required in rapid concatenation ofthe components. However, in the absence of external cues accorded thespoken compound in its rapid utterance, a machine translation of ASLcompound word requires a resolving algorithm.

Other routines are mandatory for quality translation involving ASL. Forexample, word order in the context of a spoken language should beobserved. It is set by rules which are consistently applied as a way toachieve unambiguous meaning. Such a strict rule set does not exist inASL. However, the appearance that ASL is more lax and forgiving in itsscrutiny for order and thus leading to ambiguity in the resultingmeaning is misleading. There are rules in ASL for breaking the rules. Infact, a particular word order rule is a corollary of a prevailingsituation conveyed by the signer. Hence, there is a rule for selectingthe rule of a particular word order, which together employ supplementalmeaning to the sentence, while enabling a shorter exposition. Theeconomy of exposition achieved contributes to a more efficientcommunication for the signing parties. Subtle but clear message isconveyed by such order. Sentences with classifiers indicating locationsappear with the order of Object, Subject, Verb, while Subject precedingObject which precedes Verb singularly indicates inflected verbs.Translation algorithms which treat even the most subtle of ASLidiosyncrasies as rules, emanated from and borne out of a need toimprove efficient and economic communication will attain a higher levelof comprehensive quality.

The software in FIGS. 15 and 16 handles various translation issues whichneed resolution before an acceptable translation can follow. Issues orword order in ASL, such as the word order just discussed, are germane tothe language itself.

Cultural issues require attention right from the outset. The ASL fingerspelled letter “T” viewed in Europe, or ASL signs spatially locatedrelative to the person's midsection viewed in China, will be locallyconstrued a pejorative. Hence, identification of the expression in thecontext of the intended recipient, may cause the format of delivery toundergo an appropriate substitution. Therefore, the algorithms asrelated to telephone communication, try to identify the recipient'scultural base or geography prior to dispatch, so that the algorithmicroutines for appropriate adjustments can be invoked.

Notwithstanding such efforts, the advanced group of algorithms is farfrom being comprehensive, and represents only the first step in a muchdeserving subject. FIG. 15 shows the essential components of an Englishto ASL translation algorithm, while FIG. 16 shown the ASL to Englishtranslation algorithm.

As will be appreciated, there is a substantial problem in effectuatingreal time transmission of the data as to images because of the need forcompression even after discarding superfluous information. If weconsider a video camera with 640 horizontal pixels and 480 lines, thismeans that a single frame amounts to 307,200 Bytes or 2.4576 Mbits. Whenconsidering a real time operation of 30-frames/sec, this would require73.728 Mbits/Sec. Obviously, a bottleneck will result in the transfer toand from any acceptable storage media. Furthermore, to utilize telephonelines in a meaningful way, such as at 56 kilobits/second or even at 64kilobits/second, it would take close to 20 minutes to transfer onesecond of video data. Using compression would mean a compression rate ofover 1,000:1. Even resorting to compressing the data by utilizingwavelets, the level of resulting quality would be questionable. Theother alternative is typically to transmit fewer frames per second, butthis is an unacceptable method as it results in jerky motions andbecomes difficult to interpret visual signing gestures.

In the present invention, the preferred approach is to avoid theconventional approach of trying to force some compression scheme on thedata, and instead bring the data down from the frame level to a ReducedData Set (RDS).

It will be appreciated that another significant aspect of the inventionis the requirement that finger spelling be captured by the camera,undergo the RDS process, and still be recognized once artificialintelligence procedures are invoked. This task can be difficult becausethe frame grabber has to capture the signed gesture against the ambientsurroundings, other body parts of the signing person, and clothes.Preferably, the system uses special gloves which allow discrimination ofthe hands from the background for the image processing system.

Turning now to FIGS. 13 and 14, therein illustrated is the benefit inusing special gloves to enhance the ability of the system to recognizeimportant detail of the hand shapes during the actual gesturing of signlanguage. Many times the hands are overlapping or touching each other.Video separation of left from right is accomplished by color separationusing different saturated colors for each hand. For example, the fingersof the right hand can be distinctly green and the fingers of the lefthand are distinctly blue. In addition, each glove has a third color(typically red) for left and right palm areas. This allows hand shapeand finger details to be seen whenever the hand is closed vs. opened andwhen palm is disposed toward the camera vs. palm away.

The same type of RDS is utilized in recreating images, frame by frame,in real time, which will be displayed on the deaf person's monitor.These images will appear as smooth, continuous animation which will beeasy to recognize. This is because the recreation of this animation is aresult of actual frame by frame information which has been captured froma live subject and put into memory. The RDS takes up minimal memory andyet is completely on demand, interactive, and operates at real timespeed.

At the end of the speech recognition, from the hearing persons' voiceand text building procedure, the various words will be assembled intotheir counterpart animated signing gestures, starting with the table ofdata generated from the text that was transmitted from the center doingthe frame by frame recreation for each gesture, employing specialalgorithms for transitional frames between gestures and then displayingthem in sequence on the deaf persons' monitor.

The illustrated embodiments all utilize a single video cameras. It maybe desirable to utilize more than one camera to allow the signing person“free” movement in his or her environment to track down spatialpositions in that environment.

In such a case, the installation should follow the following criteria:

-   -   1. Each camera is covering a separate angle.    -   2. Each camera operates independently of the other(s).    -   3. Angle overlap may or may not be permitted according to the        pre-signing calibration.    -   4. Integration of input from multiple camera is performed    -   5. A defined figure with signing motions (where applicable) is        rendered in conformity with allowable images (for persons). The        same technique is useful in defining any objects or, alive,        stationary or moving entities, such as animals.    -   6. Movements without signing are classified as null figures        (coordinates are preserved).    -   7. The animated form of the signing figure can be shown in an        “abbreviated” form when the person is not signing. That is, a        figure not well defined with specific locations of fingers, etc.        Such animated figures an occur for all null figures.

Recently, three dimensional video cameras have been developed. The useof such devices may facilitate recognition of signing motions byenhancing spatial differences.

Thus, it can be seen that the electronic communications system of thepresent invention provides an effective means for translating signingmotions to speech or text for a hearing party using only a normaltelephone at the hearing party's end of the line, and for translatingspeech to signing motions which are conveyed to the deaf party. Thesystem may function as a telephone for the deaf, or as an on-sitetranslator.

1. An electronic communications system for the deaf comprising: (a) avideo apparatus for visually observing the images of facial and hand andfinger signing motions of a deaf person and converting the observedsigning motions into digital identifiers; (b) means for translating saiddigital identifiers of said observed signing motions into words andphrases; (c) means for outputting said words and phrases generated bythe visual observation of said signing motions in a comprehensible formto another person; (d) a receiver for receiving spoken words and phrasesof another person and transmitting them; (e) means for translating saidspoken words and phrases into a visual form which may be observed by thea deaf person; and (f) means for outputting said visual form of saidspoken words and phrases on said video apparatus for viewing by the deafperson.
 2. The electronic communications system in accordance with claim1 wherein said another person is at a remote location.
 3. The electroniccommunications system in accordance with claim 1 wherein said videoapparatus includes a video camera and image capture and processinghardware and software.
 4. The electronic communications system inaccordance with claim 1 wherein said translating means is located at acentral station with which said video apparatus and said receiver andoutputting means are in communication.
 5. The electronic communicationssystem in accordance with claim 1 wherein said translating means alsoincludes artificial intelligence for interpreting and converting thetranslated signaling motions into words and phrases and into coherentsentences.
 6. The electronic communications system in accordance withclaim 5 wherein said outputting means converts said coherent sentencesinto synthetic speech.
 7. The electronic communications system inaccordance with claim 1 wherein said outputting means converts saidspoken words and phrases into written form.
 8. The electroniccommunications system in accordance with claim 1 wherein said videoapparatus includes a display screen.
 9. The electronic communicationssystem in accordance with claim 8 wherein said video apparatus providesan output of said spoken words and phrases as signing motions on saiddisplay screen for viewing by the deaf person.
 10. The electroniccommunications system in accordance with claim 1 wherein said videoapparatus includes a display screen to provide an output of said spokenwords and phrases as signing motions on said display screen for viewingby the deaf person, and wherein said video apparatus includes amicrophone and speaker whereby a deaf person may communicate withanother person in the immediate vicinity.
 11. The electroniccommunications system in accordance with claim 10 wherein saidtranslating means is located at a central station with which said videoapparatus and said receiver and outputting means are in communication.12. In a method for electronic communication for the deaf comprising:(a) visually observing the images of facial and hand and finger signingmotions of a deaf person and converting the observed signing motionsinto digital identifiers; (b) translating said digital identifiers ofsaid observed signing motions into words and phrases; (c) outputtingsaid words and phrases in a comprehensible form to another person; (d)receiving speech from said another person; (e) translating said speechof said another person into signing motions; and (f) displaying saidsigning motions representing said speech to said a deaf person.
 13. Theelectronic communications method in accordance with claim 12 whereinsaid another person is at a remote location.
 14. The electroniccommunication method in accordance with claim 13 wherein said step ofoutputting at a remote location is effected by transmission of saidtranslated words and phrases to a communications device receiver at saidremote location.
 15. The electronic communication method in accordancewith claim 12 wherein said step of observing and converting the signingmotions is effected by a video camera.
 16. The electronic communicationmethod in accordance with claim 12 including the step of transmittingsaid digital identifiers of said motions and said speech electronicallyto a central station where said translating steps are performed.
 17. Theelectronic communication method in accordance with claim 12 wherein saidoutputting step provides such words and phrases as synthetic speech. 18.The electronic communication method in accordance with claim 12 whereinsaid outputting step provides said words and phrases in written form tosaid another person.
 19. The electronic communication method inaccordance with claim 12 wherein said displaying step provides saidwords and phrases in written form.
 20. The electronic communicationmethod in accordance with claim 12 wherein said translating steputilizes artificial intelligence.
 21. The electronic communicationmethod and software in accordance with claim 20 wherein saidintelligence is developed with the use of multiple neural networksautomatically created and assigned by gesture type.
 22. The electroniccommunication method in accordance with claim 12 wherein said anotherperson and said displaying step are at the same location as said deafperson and wherein said visually observing and converting step utilizesa video apparatus.
 23. The electronic communication method in accordancewith claim 22 wherein said receiver and outputting steps are conductedby components of an installation including said video apparatus.
 24. Theelectronic communication method in accordance with claim 22 wherein saidtranslating steps are conducted at a remote center.
 25. The electroniccommunication method in accordance with claim 12 wherein saidtranslating steps are conducted at a remote center.
 26. An electroniccommunications communication system for the deaf comprising: (a) a videoapparatus for visually observing the images of facial and hand andfinger signing motions of a deaf person and converting the observedsigning motions into digital identifiers; (b) means for translating saiddigital identifiers of said observed signing motions into words andphrases; (c) means for outputting said words and phrases generated bythe visual observations of said signing motions in a comprehensible formto another person; (d) a receiver for receiving spoken words and phrasesof another person and transmitting them; (e) means for translating saidspoken words and phrases into signing motions which may be observed bythe a deaf person; and (f) means for outputting said signing motions onsaid video apparatus for viewing by the deaf person, said translatingmeans being located at a central station with which said video apparatusand receiver are in communication.
 27. An electronic communicationscommunication system for the deaf in accordance with claim 26 whereinsaid another person is at a remote location.
 28. An electroniccommunications communication system for the deaf in accordance withclaim 26 wherein said video apparatus includes a video camera and imagecapture and processing hardware and software.
 29. An electroniccommunications communication system for the deaf in accordance withclaim 26 wherein said translating means also includes artificialintelligence for interpreting and converting the translated motions intowords and phrases into coherent sentences.
 30. An electroniccommunications communication system for the deaf in accordance withclaim 28 wherein said outputting means converts said coherent sentencesinto synthetic speech.
 31. An electronic communications communicationsystem for the deaf in accordance with claim 26 wherein said videoapparatus includes a display screen.
 32. An electronic communicationscommunication system for the deaf in accordance with claim 26 whereinsaid video apparatus includes a display screen to provide an output ofsaid spoken words and phrases as signing motions on said display screenfor viewing by the deaf person, and wherein said video apparatusincludes a microphone and speaker whereby a deaf person may communicatewith another person in the immediate vicinity.
 33. An electroniccommunications systems for the hearing impaired comprising: a receiverfor receiving spoken words and phrases; means for translating saidspoken words and phrases into a visual form which may be observed by ahearing impaired person; said translating means including means fortransforming said spoken words into equivalent signing content and theninto textual material; means for outputting said textual material fordisplay on a device utilized by said hearing impaired person; saiddevice utilized by said hearing impaired person including means forreceiving words and phrases from the hearing impaired person; saidtransforming means converting said words and phrases from the hearingimpaired person into a form which may be presented to a hearing person;means for outputting said converted words and phrases from said hearingimpaired person; and said device utilized by said hearing impairedperson comprising a personal computer which includes a monitor and whichfurther includes a video camera for capturing facial, hand, and fingersigning motions generated by said hearing impaired person.
 34. Anelectronic communications system according to claim 33, wherein saidtranslating means are located in a station remote from said hearingimpaired person and said hearing person.
 35. An electroniccommunications system according to claim 33, further comprising meansfor converting said captured signing motions into a plurality ofidentifiers and means for transmitting said plurality of identifiers tosaid translating means.
 36. An electronic communications systemaccording to claim 35, wherein said transmitting means comprises atleast one telephone line.
 37. An electronic communications systemaccording to claim 35, wherein said translating means includes means forcorrelating said identifiers with a vocabulary and grammar database. 38.An electronic communications system according to claim 33, wherein saidtranslating means includes artificial intelligence means for providingan analysis of the emotional content of said spoken words and whereinsaid system further comprises means for separately conveying saidemotional content to said device utilized by said hearing impairedperson.
 39. An electronic communications system according to claim 33,wherein said device has means for converting textual material receivedfrom said translating means into reduced identifying pointers and forconverting said reduced identifying pointers into animated images whichportray in sign language the content of the spoken words and phrases.40. An electronic communications system according to claim 33, whereinsaid device utilized by said hearing impaired person is located in akiosk.
 41. An electronic communications system according to claim 33,wherein said device utilized by said hearing impaired person comprises aportable transmitter/receiver.
 42. An electronic communications systemaccording to claim 33, wherein said output means comprises means fortransmitting said text via telephone lines and said device used by saidhearing impaired person includes means for converting said transmittedtext to animated images.
 43. An electronic communication system for thehearing impaired comprising: a receiver for receiving spoken words andphrases; means for translating said spoken words and phrases into avisual form which may be observed by a hearing impaired person; saidtranslating means including means for transforming said spoken wordsinto equivalent signing content and then into textual material; meansfor outputting said textual material for display on a device utilized bysaid hearing impaired person; said device utilized by said hearingimpaired person including means for receiving words and phrases from thehearing impaired person; said system including a video apparatus forvisually observing any images of facial and hand and finger signingmotions of the hearing impaired person and converting any observedsigning motions into digital identifiers; said transforming meansconverting said words and phrases from the hearing impaired person intoa form which may be presented to a hearing person; said transformingmeans including means for translating said digital identifiers of saidobserved signing motions into words and phrases; means for outputtingsaid translated words and phrases from said hearing impaired person; andsaid outputting means including means for outputting said words andphrases generated by the visual observation of said signing motions in acomprehensible form to another person.